Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add client interface. #16

Merged
merged 88 commits into from
Nov 30, 2024
Merged

Conversation

sitaowang1998
Copy link
Collaborator

@sitaowang1998 sitaowang1998 commented Oct 31, 2024

Description

Add client interface.

Validation performed

  • task:lint

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a comprehensive quick start guide for the Spider distributed task executor, detailing task creation, execution, and cluster setup.
    • Added a Data class for managing external data storage and a Driver class for client interactions with the Spider framework.
    • Implemented a Job class for tracking running tasks and a TaskContext class for managing task execution.
    • Added new concepts and type definitions for task management, enhancing the framework's functionality.
  • Bug Fixes

    • Updated task registration macros for consistency across the codebase.
  • Documentation

    • Enhanced documentation with new guides and concepts related to task management.
  • Refactor

    • Improved type handling in function signatures and reorganized include directives for better clarity and efficiency.

Copy link

coderabbitai bot commented Oct 31, 2024

Walkthrough

The pull request introduces a quick start guide for the Spider distributed task executor, detailing its core functionalities such as task creation, execution, and cluster setup. It adds several new classes for task management, data handling, and error handling, alongside modifications to the CMake configuration and formatting files. The task registration macro is renamed for consistency, and various include directives are reorganized for improved structure. These changes collectively enhance the usability and modularity of the Spider framework.

Changes

File Change Summary
docs/quick_start.md Introduced a quick start guide for Spider, detailing task creation, execution, cluster setup, and recovery features.
src/spider/.clang-format Updated regex in IncludeCategories to match multiple library headers.
src/spider/CMakeLists.txt Added new libraries spider_client_lib and spider_client, with updated linking for spider_worker.
src/spider/client/Data.hpp Added Data class for managing external data, including methods for locality and cleanup.
src/spider/client/Driver.hpp Introduced Driver class for job creation and key-value store access, with methods for task management.
src/spider/client/Exception.hpp Added custom exception handling with ConnectionException and DriverIdInUseException.
src/spider/client/Job.hpp Added Job class template for managing running tasks, including status and result retrieval methods.
src/spider/client/TaskContext.hpp Introduced TaskContext class for managing tasks, including methods for task management and job tracking.
src/spider/client/TaskGraph.hpp Added TaskGraph class to represent a directed acyclic graph of tasks.
src/spider/client/spider.hpp Added header guard and reorganized includes for better dependency management.
src/spider/client/task.hpp Introduced concepts and type definitions for task management, including TaskIo and TaskFunction.
src/spider/client/type_utils.hpp Added utility for checking type specializations with IsSpecialization.
src/spider/core/Data.hpp Reordered include statements for the Boost UUID library without functional changes.
src/spider/core/Serializer.hpp Activated UUID header and added serialization concepts for handling UUIDs.
src/spider/core/Task.hpp Reordered include statements without functional changes.
src/spider/core/TaskGraph.hpp Reordered include statements without functional changes.
src/spider/storage/DataStorage.hpp Reorganized include directives for Boost UUID library without functional changes.
src/spider/storage/MetadataStorage.hpp Reorganized include directives for Boost UUID library without functional changes.
src/spider/storage/MysqlStorage.cpp Reorganized include statements without functional changes.
src/spider/storage/MysqlStorage.hpp Reorganized include directives without functional changes.
src/spider/worker/FunctionManager.hpp Renamed task registration macro and updated type handling in signature struct template.
tests/worker/test-FunctionManager.cpp Updated test cases to use the new task registration macro while preserving existing test logic.

Possibly related PRs

  • feat: Add MySql support for storage backend #20: This PR introduces MySQL support for the storage backend, which is directly related to the main PR's focus on the Spider framework's task execution and data management, as it enhances the storage capabilities necessary for managing task states and results.

  • test: Add tests for data storage #21: This PR adds unit tests for data storage, which is relevant as it tests the functionalities introduced in the main PR, ensuring that the new task execution features work correctly with the data storage mechanisms.

  • test: Add unit tests for metadata storage and fix bugs in MySQL storage backend #23: This PR focuses on adding unit tests for metadata storage and fixing bugs in the MySQL storage backend, which is directly related to the main PR's enhancements in task management and execution, ensuring that the new features are robust and error-free.

  • feat: Add function manager for register and run function by function name #25: This PR introduces a FunctionManager that allows for the registration and execution of functions by name, which aligns with the main PR's goal of enhancing task execution in the Spider framework, as it provides a mechanism for managing task functions effectively.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unclear on how all the APIs fit together concretely and I think someone who is seeing these headers for the first time would be a bit lost. I guess we will need to iterate on these a few times to get them to a point where they are clear (and meet the style guide). I think the first step is if you can add one or more readmes describing the layout of these APIs and how a user can use these APIs. Docstrings for each class will also help significantly.

Comment on lines 15 to 16
private:
std::unique_ptr<DataImpl> m_impl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private declarations should come after public.

src/spider/client/Data.hpp Show resolved Hide resolved
Comment on lines 20 to 21
* Gets the values stored in Data.
* @return value stored in Data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Gets the values stored in Data.
* @return value stored in Data.
* @return The stored value.

As our internal guidelines mention, for most getters, we can omit the docstring description and only describe the return value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the special case. This is not an ordinary getter. It accesses data storage under the hood and could throw exception, or return error once we get error included in the interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since neither the exception or error are currently in the code, let's write the docstring according to what exists (not what will exist later). When we eventually add the error information, we can update the docstring appropriately.

Comment on lines 23 to 24
auto get() -> T;
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto get() -> T;
/**
auto get() -> T;
/**

Add an empty line between methods.

Comment on lines 24 to 29
/**
* Indicates that the data is persisted and should not be rollbacked
* on failure recovery.
*/
// Not implemented in milestone 1
// void mark_persist();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, it's a bad idea to add code that we'll use in the future since that day may never come. If that day comes, we will remember what to do based on the design doc rather than this commented code.

* Sets the key for the data. If no key is provided, Spider generates a key.
* @param key of the data
*/
auto key(std::string const& key) -> Data<T>::Builder&;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return value is missing from docstring.

/**
* Sets locality list of the data.
* @param nodes nodes that has locality
* @param hard true if the locality list is a hard requirement, false otherwise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docstrings are a bit unclear to me. I guess the idea is that if hard=false, the data can be accessed from any node in the cluster, but if possible, it should be accessed from the nodes specified in this list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Hard requirement means data can only be accessed on nodes in the locality list. Will change the docstring to make it clear.

* Sets the key for the data. If no key is provided, Spider generates a key.
* @param key of the data
*/
auto key(std::string const& key) -> Data<T>::Builder&;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer to name all these setters as set_xxx where xxx is the thing being set.

// Not implemented for milestone 1
// auto rollback(std::function<const T&()> const& f) -> Data<T>::Builder&;
/**
* Builds the data. Stores the value of data into storage with locality list, persisted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this class store the data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally all values are serialized and stored in data storage.


class DataImpl {};

template <class T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should document template parameters as well.

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed quickstart.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our naming convention is to use kebab-case for markdown files except for the main readme in a repo which is called README.md. So this file should be called quick-start.md. I've added this to our internal guidelines.

@@ -0,0 +1,204 @@
# Spider Quick Start Guide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our convention is to use sentence case for headings rather than capitlizing every word. So this should be written as # Spider quick start guide.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed all titles, but I treat Spider as a special terminology and keep it capitalized.

# Spider Quick Start Guide

## Set Up Spider
To get started, first start a database supported by Spider, e.g. MySql. Second, start a scheduler and connect it to the database by running `spider start --scheduler --db <db_url> --port <scheduler_port>`. Third, start some workers and connect them to the database by running `spider start --worker --db <db_url>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use lists rather than a block of text. Lists are faster to read than blocks of text.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a linter to do this later, but treat Markdown as code and wrap lines to 100 characters. This makes it easier to review and read (not everyone has a good Markdown renderer on their local machine).

# Spider Quick Start Guide

## Set Up Spider
To get started, first start a database supported by Spider, e.g. MySql. Second, start a scheduler and connect it to the database by running `spider start --scheduler --db <db_url> --port <scheduler_port>`. Third, start some workers and connect them to the database by running `spider start --worker --db <db_url>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a caveat for starting the worker below. You should note that here otherwise someone will try to start the worker, it won't work, and they will give up.

}
```

However, it is impossible to get the return value of the task graph from a client. We have a solution by sharing data using key-value store, which will be discussed later. Another solution is to run task or task graph inside a task and wait for its value, just like a client. This solution is closer to the conventional function call semantic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it is impossible to get the return value of the task graph from a client.

This sentence doesn't make sense since the previous section has an example where the client is retrieving the result of a task graph.

Reading the later parts of this guide, I guess you mean that you can't return the value of a dynamically created task to the client (which is intuitive since the interface doesn't provide a mechanism to do so).


Spider lets user pass the metadata of these data around in `spider::Data` objects. `Data` stores the value of the metadata information of external data, and provides crucial information to Spider for correct and efficient scheduling and failure recovery. `Data` stores a list of nodes which has locality of the external data, and user can specify if locality is a hard requirement, i.e. task can only run on the nodes in locality list. `Data` can include a `cleanup`function, which will run when the `Data` object is no longer reference by any task and client. `Data` has a persist flag to represent that external data is persisted and do not need to be cleaned up.

```c++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is complicated and should definitely have a description. To not have a description means that we're slowing down the user since they first need to infer what the goal of the example's code is, then they can focus on the details of using the framework.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is also a good reason why our internal guidelines say that we should organize things from public to private. If you apply that thinking here, we would put main first and then the task methods. That is the order that a user should read the methods (the opposite order requires the user to remember a lot of context before they get to main).

.build(HdfsFile { "/path/to/input" });
spider::Future<spider::Data<HdfsFile>> future = spider::run(
spider::bind(map, filter),
input);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input is not a POD, so that violates the constraint mentioned above about task arguments being POD.

To get started, first start a database supported by Spider, e.g. MySql. Second, start a scheduler and connect it to the database by running `spider start --scheduler --db <db_url> --port <scheduler_port>`. Third, start some workers and connect them to the database by running `spider start --worker --db <db_url>`.

## Start a Client
Client first creates a Spider client driver and connects it to the database. Spider automatically cleans up the resource in driver's destructor, but you can close the driver to release the resource early.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have an intro section where we describe the actors and architecture in the system (client, scheduler, worker, database, etc.). It doesn't need to be too detailed (that can be in another doc), but it should provide enough context for the user to visualize what we're talking about.

## Note on Worker Setup
The setup section said that we can start a worker by running `spider start --worker --db <db_url>`. This is oversimplified. The worker has to know the function it will run.

When user compiles the client code, an executable and a library are generated. The executable executes the client code as expected. The library contains all the functions registered by user. Worker needs to run with a copy of this library. The actual commands to start a worker is `spider start --worker --db <db_url> --libs [client_libraries]`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, so in the guide above, wouldn't it be easier to just say that we need to create a task library with all the tasks the user wants to run? Then we can reference those tasks in the client code. Right now, the guide says we need to register the tasks by calling spider::register_task which sounds pretty opaque to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User is responsible to create a task library and the client executable. Actually a good practice for user is to put the spider::register_task with the tasks' declaration instead of main, so maybe I should task registration into "Create a task section?

`spider::Data` that is directly used as input. Spider requires that the types of `Task` or
`TaskGraph` outputs or POD type or `spider::Data` matches the input types of child task.

Binding the tasks together forms a dependencies among tasks, which is represented by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. How do you plan to communicate (serialize) the task calls (name + args) from the client to the server?
  2. Let's say a task takes two inputs. Does this interface support using a constant for one input and a task for the other?
    1. If so, do you support a task graph that looks like this?
      flowchart TD
          leaf["foo(int, int) -> int"]
          parent["bar(int, int) -> int"]
          3 --> leaf
          4 --> parent
          5 --> parent
          parent --> leaf
      
      Loading
    2. If so, that means any arguments the user passes into run get passed to the inputs in the task graph with a kind of DFS ordering. Is that true?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Here comes the use of spider::register_task, which records the mapping between function name and the function pointer. When the user calls run with the function pointer, the library actually sends the function name to the db. Worker also gets the name of the function as part of the task metadata. Since the worker links to the library with register_task call, it also has the mapping and can know which function to call.
    As for arguments, their types are stored in db (serialized as string right now), and values are serialized into string.
  2. Yes.
    i. Yes.
    ii. No. spider::bind needs to bind all inputs of the child task. Thus, the value 3 must be the an argument in bind. 4 and 5 can be passed in run. However, it is possible that there are multiple first layer tasks, i.e. tasks with no parents. The input of task graph is the product of all first layer tasks' output.
    It is possible to support passing 3 as an argument in run for the above example. We can have a special placeholder type for bind. In such case, DFS is an intuitive order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can you explain how:
    1. register_task maps a function pointer into a function name?
    2. run turns a function pointer into a function name?
  2. Can you explain this point with an example?

    However, it is possible that there are multiple first layer tasks, i.e. tasks with no parents. The input of task graph is the product of all first layer tasks' output.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For client and worker, when the library is loaded, it will create a mapping between function name and function pointer.
    i. I am thinking of turning register_task into a macro to get the function name at compile time. It then stores the mapping between the function name to the function pointer.
    ii. run gets the function name from the mapping stored before.
  2. In the example below, both bar and baz has no parent, and task graph input is their inputs put together, i.e. 1, 2, 3 and 4.
graph TD
    1[1]
    2[2]
    3[3]
    4[4]
    foo["foo(int, int) -> int"]
    bar["bar(int, int) -> int"]
    baz["baz(int, int) -> int"]
    1 --> bar
    2 --> bar
    3 --> baz
    4 --> baz
    bar --> foo
    baz --> foo
Loading

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Makes sense.
  2. Gotcha. So to clarify my understanding:
    • in run, users can only pass arguments to root tasks (tasks with no parents).
    • in run, when the user passes a list of arguments [1, 2, 3, 4], conceptually:
      • the framework will iterate over the root tasks in order;
      • for each task argument, the framework will pop one argument from the list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and yes.

… in client id

spider::run now returns a Job represents a running task or task graph,
so user can cancel the job.
User can now pass in client id when creating the driver, so the
recovered client can get all jobs by client id.
@kirkrodrigues
Copy link
Member

Can you make the changes to the header files that we discussed on Thursday?

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this file doesn't correspond to a class, the filename should be lowercase.

#include <optional>
#include <string>

// NOLINTBEGIN(misc-include-cleaner)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using IWYU's pragmas, right?

/**
* Initializes Spider library
*/
void init();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will this function do internally?

* Registers function to Spider
* @param function function to register
*/
template <class R, class... Args>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace R and Args with C++20 concepts that specify the exact properties of acceptable inputs. If I understand correctly, currently, return values and arguments can be a certain set of serializable values.

On that note though, if the requirement is that the value types need to be serializable, why don't we define an interface for serializable value types and then the user has the flexibility to use any serializable type they want? We could do this later, but conceptually, is it possible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also use the same C++20 concepts wherever we have task inputs/outputs. This would also simplify the docstrings since we don't need to repeat what types are acceptable.

* @return job representing the running task
*/
template <class R, class... Args>
auto run(std::function<R(Args...)> const& task, Args&&... args) -> Job<R>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming the run functions to start? run implies that the function won't return until the task/taskgraph completes.


// NOLINTEND(misc-include-cleaner)

namespace spider {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the functions in this file and those in KeyValueData, I feel like it makes more sense to have a Client/Driver class that encapsulates the functions (similar to Context). Otherwise, a user could call, for example, insert_kv without initializing a client and then it would fail or invoke undefined behaviour, right?

@@ -0,0 +1,61 @@
#ifndef SPIDER_CLIENT_FUTURE_HPP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-check all the header guards.

*
* @tparam T type of the value. T must be a POD.
*/
template <class T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I mentioned in Spider.hpp, we can restrict the types using C++20 concepts.

CMakeLists.txt Outdated
Comment on lines 15 to 19
# AppleClang complains about file has no symbol and abort the build.
if(APPLE)
set(CMAKE_CXX_ARCHIVE_CREATE "<CMAKE_AR> Scr <TARGET> <LINK_FLAGS> <OBJECTS>")
set(CMAKE_CXX_ARCHIVE_FINISH "<CMAKE_RANLIB> -no_warning_for_no_symbols -c <TARGET>")
endif()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this anymore considering we've dropped support for building on macOS?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
src/spider/client/Data.hpp (1)

46-74: Add validation and improve Builder documentation.

The Builder implementation could benefit from:

  1. Documentation about method chaining
  2. Validation for empty node vectors and null cleanup functions

Consider adding validation in set_locality:

auto set_locality(std::vector<std::string> const& nodes, bool hard) -> Builder& {
    if (nodes.empty()) {
        throw std::invalid_argument("Locality nodes vector cannot be empty");
    }
    // ... existing implementation
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 165eb84 and 488eaa3.

📒 Files selected for processing (2)
  • src/spider/client/Data.hpp (1 hunks)
  • src/spider/client/TaskContext.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/spider/client/TaskContext.hpp
🔇 Additional comments (5)
src/spider/client/Data.hpp (5)

1-10: LGTM! Header organization follows best practices.

The includes are well-organized and appropriate for the required functionality.


76-78: LGTM! Good use of pImpl idiom.

The use of pImpl pattern effectively hides implementation details and helps maintain ABI compatibility.


31-45: 🛠️ Refactor suggestion

Improve method documentation and ordering.

  1. The get() method should document potential errors or exceptions.
  2. The set_locality() parameter documentation should explain what nodes represents.
  3. Per Google C++ style guide, private declarations should come after public ones.

Consider adding error handling documentation:

/**
 * Retrieves the stored value.
 * @return The stored value.
 * @throws std::runtime_error If the data cannot be accessed or deserialized.
 */
auto get() -> T;

29-29: Verify Serializable concept implementation.

Let's ensure the Serializable concept is properly defined and documented.

✅ Verification successful

Serializable concept is properly implemented and used

The Serializable concept is well-defined in src/spider/core/Serializer.hpp as a composition of two concepts:

  • SerializableImpl: Requires types to support msgpack serialization
  • DeSerializableImpl: Requires types to support msgpack deserialization

The concept is appropriately used in the codebase:

  • As a template constraint for the Data class
  • As part of the TaskIo concept definition
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for Serializable concept definition
rg -A 5 "concept\s+Serializable"

# Search for any serialization-related functions that types must implement
ast-grep --pattern 'template <$_>
concept Serializable = $_'

Length of output: 592


Script:

#!/bin/bash
# Let's also check for DeSerializableImpl concept definition
rg -A 5 "concept\s+DeSerializableImpl"

# And check where this Serializable concept is being used
rg "Serializable\s*[<\s]" -A 2

Length of output: 930


14-29: ⚠️ Potential issue

Fix inconsistencies in example code.

The example code has several issues:

  1. set_locality example uses a single string, but the method expects a vector
  2. There's a typo in cont& instead of const&
  3. The cleanup function signature doesn't match the method declaration

Apply this diff to fix the example:

-         .set_locality({ "node_address" }, true)
-         .set_cleanup_func([](std::string cont& path) { std::remove(path); })
+         .set_locality(std::vector<std::string>{"node_address"}, true)
+         .set_cleanup_func([](std::string const& path) { std::remove(path.c_str()); })

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
src/spider/client/Data.hpp (3)

29-30: Consider adding error handling mechanism.

The Serializable concept constraint is good, but the class should provide a way to handle potential serialization errors.

Consider adding error handling through either:

  1. Exception handling
  2. std::expected (C++23) or tl::expected
  3. Error codes

37-44: Enhance documentation for locality parameters.

The parameter documentation could be more descriptive:

  • What constitutes a valid node address?
  • What happens if an invalid node is specified?
  • What are the implications of hard vs soft locality?

46-74: Consider adding noexcept specifications and move operations.

The Builder implementation could be enhanced:

  1. Mark methods as noexcept where applicable
  2. Add move operations for better performance
  3. Consider making cleanup function noexcept

Example improvement:

auto set_cleanup_func(std::function<void(T const&) noexcept> f) noexcept -> Builder&;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 488eaa3 and 4d2aa6c.

📒 Files selected for processing (6)
  • src/spider/client/Data.hpp (1 hunks)
  • src/spider/client/Driver.hpp (1 hunks)
  • src/spider/client/Job.hpp (1 hunks)
  • src/spider/client/TaskContext.hpp (1 hunks)
  • src/spider/client/task.hpp (1 hunks)
  • src/spider/client/type_utils.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/spider/client/Driver.hpp
  • src/spider/client/Job.hpp
  • src/spider/client/TaskContext.hpp
  • src/spider/client/task.hpp
  • src/spider/client/type_utils.hpp
🔇 Additional comments (2)
src/spider/client/Data.hpp (2)

1-10: LGTM! Well-organized includes and proper header guards.

The includes are properly organized with standard libraries first, followed by local includes, and all are necessary for the implementation.


20-25: ⚠️ Potential issue

Fix inconsistencies in the example code.

The example code doesn't match the actual method signatures:

  1. The cleanup function lambda has incorrect signature
  2. The locality parameter is passed as a single string instead of a vector

Apply this diff to fix the example:

 auto disk_file_data = spider::Data<std::string>::Builder()
-         .set_locality({"node_address"}, true)
-         .set_cleanup_func([](std::string cont& path) { std::remove(path); })
+         .set_locality(std::vector<std::string>{"node_address"}, true)
+         .set_cleanup_func([](std::string const& path) { std::remove(path.c_str()); })
          .build("/path/of/file");

Likely invalid or redundant comment.

Comment on lines +76 to +78
private:
std::unique_ptr<DataImpl> m_impl;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add destructor declaration for pImpl idiom.

When using the pImpl idiom with std::unique_ptr, you need to declare the destructor in the header and define it in the implementation file where the complete type is available.

Add the following declaration:

~Data();  // Declaration only, define in .cpp file

Comment on lines 1 to 81
#ifndef SPIDER_CLIENT_DATA_HPP
#define SPIDER_CLIENT_DATA_HPP

#include <functional>
#include <memory>
#include <string>
#include <vector>

#include "../core/Serializer.hpp"

namespace spider {
class DataImpl;

/**
* A representation of data stored on external storage. This class allows the user to define:
* - how the data should be cleaned up (garbage collected) once it is no longer referenced.
* - the locality of the data.
*
* Example:
* @code{.cpp}
* auto disk_file_data = spider::Data<std::string>::Builder()
* .set_locality({"node_address"}, true)
* .set_cleanup_func([](std::string cont& path) { std::remove(path); })
* .build("/path/of/file");
* @endcode
*
* @tparam T Type of the value.
*/
template <Serializable T>
class Data {
public:
/**
* @return The stored value.
*/
auto get() -> T;

/**
* Sets the data's locality, indicated by the nodes that contain the data.
*
* @param nodes
* @param hard Whether the data is only accessible from the given nodes (i.e., the locality is a
* hard requirement).
*/
void set_locality(std::vector<std::string> const& nodes, bool hard);

class Builder {
public:
/**
* Sets the data's locality, indicated by the nodes that contain the data.
*
* @param nodes
* @param hard Whether the data is only accessible from the given nodes (i.e., the locality
* is a hard requirement.
* @return self
*/
auto set_locality(std::vector<std::string> const& nodes, bool hard) -> Builder&;

/**
* Sets the cleanup function for the data. This function will be called when the data is no
* longer referenced.
*
* @param f
* @return self
*/
auto set_cleanup_func(std::function<void(T const&)> const& f) -> Builder&;

/**
* Builds the data object.
*
* @param t Value of the data
* @return The built object.
*/
auto build(T const& t) -> Data;
};

private:
std::unique_ptr<DataImpl> m_impl;
};
} // namespace spider

#endif // SPIDER_CLIENT_DATA_HPP
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Template implementation file missing

The code appears to be a header-only template library where:

  • Data.hpp in the client directory defines the template interface
  • Data.hpp in the core directory contains a different concrete implementation
  • The template implementation is missing, and DataImpl is only forward-declared

Required actions:

  • Implement the template methods for Data<T> either in the header file or in a separate .ipp file
  • Provide the complete implementation of the DataImpl class
🔗 Analysis chain

Verify implementation file exists.

Ensure that all declared methods are implemented in the corresponding .cpp file.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for the implementation file
fd "Data\.(cpp|cc|cxx)$" src/spider/client/

# Check if all methods are implemented
rg "Data<.*>::\w+" src/spider/client/

Length of output: 174


Script:

#!/bin/bash
# Let's check if there's a template implementation in the header file itself
rg -U "template.*class Data.*\{.*\}" -A 100 src/spider/client/Data.hpp

# Also check if there's any inline implementation
rg "inline.*Data<.*>::\w+" src/spider/client/

# Check for potential implementation in other directories
fd "Data(Impl)?\.(cpp|cc|cxx|hpp|h)$" src/spider/

Length of output: 224


Script:

#!/bin/bash
# Let's examine the core/Data.hpp to understand if it contains the implementation
cat src/spider/core/Data.hpp

# Also check for DataImpl definition
rg "class DataImpl" src/spider/

Length of output: 2017

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • All of the Driver and TaskContext methods should throw an exception if they can't connect to the storage component, right? If so, can we document that with @throw?
  • Do we need any of the forward declared XxxImpl classes in this PR?
  • I'm still working on the quick-start guide, so in the interest of time, can we remove it from this pull request and then I'll submit another pull request to add it?

* fails and then needs to reconnect and retrieve all previously created jobs. NOTE: It is
* undefined behaviour for two clients to concurrently use the same ID.
*/
Driver(std::string const& storage_url, boost::uuids::uuid id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should throw an exception if the client ID is already in use, right?

#include "TaskGraph.hpp"

namespace spider {
class ContextImpl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this in this PR?

@sitaowang1998
Copy link
Collaborator Author

  1. We can add throw, and remove it later if we have the error code ready. Is there other exception that should be thrown other than connection error?
  2. I'll remove all pImpl forward declaration.
  3. I'll remove the quick start guide.

@kirkrodrigues
Copy link
Member

  1. As far as I can tell, just the connection error one and the client ID already being in use.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
src/spider/client/Data.hpp (2)

37-44: Enhance locality documentation

The documentation should clarify:

  1. The implications of hard vs soft locality
  2. The behaviour when accessing data from non-specified nodes
  3. Whether locality can be changed after data is created

46-74: Enhance Builder implementation

Consider the following improvements:

  1. Add move semantics support
  2. Add parameter validation
  3. Make Builder moveable but non-copyable
class Builder {
public:
    Builder() = default;
    Builder(Builder&&) noexcept = default;
    Builder& operator=(Builder&&) noexcept = default;
    Builder(const Builder&) = delete;
    Builder& operator=(const Builder&) = delete;
    // ... existing methods ...
private:
    void validate_nodes(std::vector<std::string> const& nodes) {
        if (nodes.empty()) {
            throw std::invalid_argument("Nodes vector cannot be empty");
        }
    }
};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4d2aa6c and 88ed638.

📒 Files selected for processing (1)
  • src/spider/client/Data.hpp (1 hunks)
🔇 Additional comments (3)
src/spider/client/Data.hpp (3)

20-25: Update example code to match method signatures.

The example code doesn't match the actual method signatures:

  • set_locality expects a vector of strings, not a single string
  • set_cleanup_func signature is void(T const&), which matches the example but differs from what was previously discussed
- auto disk_file_data = spider::Data<std::string>::Builder()
-         .set_locality({"node_address"}, true)
-         .set_cleanup_func([](std::string const& path) { std::filesystem::remove(path); })
-         .build("/path/of/file");
+ auto disk_file_data = spider::Data<std::string>::Builder()
+         .set_locality(std::vector<std::string>{"node_address"}, true)
+         .set_cleanup_func([](std::string const& path) { std::filesystem::remove(path); })
+         .build("/path/of/file");

76-78: Add destructor declaration for pImpl idiom

When using the pImpl idiom with std::unique_ptr, you need to declare the destructor in the header.

 private:
     std::unique_ptr<DataImpl> m_impl;
+    ~Data();

29-30: Template implementation file missing

The template implementation is missing. Consider:

  1. Adding implementation in a separate .ipp file
  2. Including the .ipp file at the end of this header

Comment on lines +32 to +35
/**
* @return The stored value.
*/
auto get() -> T;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding error handling to get()

The get() method might fail when accessing the underlying storage. Consider:

  1. Adding error handling using std::expected (C++23) or Result type
  2. Documenting potential failure scenarios

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)
src/spider/client/Exception.hpp (1)

11-30: Consider implementing a base exception class.

To improve maintainability and reduce code duplication, consider implementing a base Spider exception class that handles common functionality like message formatting and storage. This would also make it easier to add new exception types in the future.

Example structure:

class SpiderException : public std::exception {
protected:
    explicit SpiderException(std::string message) : m_message(std::move(message)) {}
    mutable std::string m_message;

public:
    [[nodiscard]] const char* what() const noexcept override {
        return m_message.c_str();
    }
};

class ConnectionException final : public SpiderException {
public:
    explicit ConnectionException(std::string addr)
        : SpiderException(fmt::format("Cannot connect to storage {}.", addr)) {}
};
src/spider/client/Data.hpp (1)

37-44: Improve parameter names and documentation for set_locality

The parameter names and documentation could be more descriptive:

  • Rename nodes to node_addresses for clarity
  • Expand documentation to explain the implications of hard vs soft locality requirements
-     * @param nodes
+     * @param node_addresses List of node addresses where the data is stored
-     * @param hard Whether the data is only accessible from the given nodes (i.e., the locality is a
-     * hard requirement).
+     * @param hard If true, data can only be accessed from the specified nodes.
+     *             If false, data can be accessed from any node but preferably from the specified nodes.
src/spider/client/Job.hpp (1)

12-14: Address the TODO: Implement error handling with std::expected or Boost.Outcome

The TODO comment suggests enhancing error handling by using std::expected or Boost.Outcome. Implementing this would allow users to retrieve job results in a more robust and less error-prone manner, simplifying the API.

Would you like assistance in implementing this enhancement or opening a GitHub issue to track this task?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 88ed638 and 73eabef.

📒 Files selected for processing (6)
  • src/spider/client/Data.hpp (1 hunks)
  • src/spider/client/Driver.hpp (1 hunks)
  • src/spider/client/Exception.hpp (1 hunks)
  • src/spider/client/Job.hpp (1 hunks)
  • src/spider/client/TaskContext.hpp (1 hunks)
  • src/spider/client/TaskGraph.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/spider/client/Driver.hpp
  • src/spider/client/TaskContext.hpp
  • src/spider/client/TaskGraph.hpp
🔇 Additional comments (7)
src/spider/client/Exception.hpp (1)

1-10: LGTM! Header guards and includes are properly implemented.

The header guards follow the standard naming convention, and all necessary headers are included for the required functionality.

src/spider/client/Data.hpp (4)

20-25: Update example code to match method signatures.

The example code doesn't match the actual method signatures:

  • set_cleanup_func expects void(T const&) but example shows filesystem operation
  • set_locality is shown with a single string but expects vector

32-35: Consider adding error handling to get()

The method should handle potential failures when accessing the underlying storage. Consider using std::expected (C++23) or a Result type.


78-80: Add destructor declaration for pImpl idiom

When using the pImpl idiom with std::unique_ptr, you need to declare the destructor in the header file.


29-80: Template implementation file missing

The template implementation is missing:

  • Methods are declared but not implemented
  • Implementation must be available at compile time
src/spider/client/Job.hpp (2)

60-60: Consider adding runtime checks to prevent undefined behaviour

As previously noted, the methods get_result() and get_error() may cause undefined behaviour if called when the job is not in the correct state. Adding runtime checks to throw exceptions in these methods would improve robustness.

Also applies to: 71-71


36-36: Provide implementations for template methods to prevent linker errors

As previously mentioned, the methods declared in the Job template class lack implementations. This could lead to linker errors. Consider implementing these methods inline or in an included template implementation file.

Also applies to: 43-43, 50-50, 60-60, 71-71

Comment on lines 12 to 18
class ConnectionException final : std::exception {
public:
auto what() -> std::string { return fmt::format("Cannot connect to storage {}.", m_addr); }

private:
std::string m_addr;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical issues in ConnectionException implementation.

The class has several issues that need to be addressed:

  1. Inheritance from std::exception should be public
  2. what() method doesn't properly override std::exception::what()
  3. Missing constructor to initialize m_addr
  4. Missing address parameter in error message

Apply this diff to fix the issues:

-class ConnectionException final : std::exception {
+class ConnectionException final : public std::exception {
 public:
-    auto what() -> std::string { return fmt::format("Cannot connect to storage {}.", m_addr); }
+    explicit ConnectionException(std::string addr) : m_addr(std::move(addr)) {}
+
+    [[nodiscard]] const char* what() const noexcept override {
+        m_what = fmt::format("Cannot connect to storage {}.", m_addr);
+        return m_what.c_str();
+    }

 private:
     std::string m_addr;
+    mutable std::string m_what;  // Mutable to allow modification in const what()
 };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class ConnectionException final : std::exception {
public:
auto what() -> std::string { return fmt::format("Cannot connect to storage {}.", m_addr); }
private:
std::string m_addr;
};
class ConnectionException final : public std::exception {
public:
explicit ConnectionException(std::string addr) : m_addr(std::move(addr)) {}
[[nodiscard]] const char* what() const noexcept override {
m_what = fmt::format("Cannot connect to storage {}.", m_addr);
return m_what.c_str();
}
private:
std::string m_addr;
mutable std::string m_what; // Mutable to allow modification in const what()
};

Comment on lines 20 to 28
class DriverIdUsedException final : std::exception {
public:
auto what() -> std::string {
return fmt::format("Driver id {} already used.", boost::uuids::to_string(m_id));
}

private:
boost::uuids::uuid m_id;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical issues in DriverIdUsedException implementation.

The class has similar issues to ConnectionException:

  1. Inheritance from std::exception should be public
  2. what() method doesn't properly override std::exception::what()
  3. Missing constructor to initialize m_id

Apply this diff to fix the issues:

-class DriverIdUsedException final : std::exception {
+class DriverIdUsedException final : public std::exception {
 public:
-    auto what() -> std::string {
-        return fmt::format("Driver id {} already used.", boost::uuids::to_string(m_id));
-    }
+    explicit DriverIdUsedException(boost::uuids::uuid id) : m_id(id) {}
+
+    [[nodiscard]] const char* what() const noexcept override {
+        m_what = fmt::format("Driver id {} already used.", boost::uuids::to_string(m_id));
+        return m_what.c_str();
+    }

 private:
     boost::uuids::uuid m_id;
+    mutable std::string m_what;  // Mutable to allow modification in const what()
 };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class DriverIdUsedException final : std::exception {
public:
auto what() -> std::string {
return fmt::format("Driver id {} already used.", boost::uuids::to_string(m_id));
}
private:
boost::uuids::uuid m_id;
};
class DriverIdUsedException final : public std::exception {
public:
explicit DriverIdUsedException(boost::uuids::uuid id) : m_id(id) {}
[[nodiscard]] const char* what() const noexcept override {
m_what = fmt::format("Driver id {} already used.", boost::uuids::to_string(m_id));
return m_what.c_str();
}
private:
boost::uuids::uuid m_id;
mutable std::string m_what; // Mutable to allow modification in const what()
};


/**
* NOTE: It is undefined behavior to call this method for a job that is not in the `Succeed`
* state.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct the job state names in the documentation

There are inconsistencies in the documentation comments regarding job state names:

  • In line 54, 'Succeed' should be 'Succeeded' to match the enum JobStatus::Succeeded.
  • In line 63, 'Fail' should be 'Failed' to match the enum JobStatus::Failed.

Apply the following changes to correct the documentation:

- * NOTE: It is undefined behaviour to call this method for a job that is not in the `Succeed` state.
+ * NOTE: It is undefined behaviour to call this method for a job that is not in the `Succeeded` state.
- * NOTE: It is undefined behaviour to call this method for a job that is not in the `Fail` state.
+ * NOTE: It is undefined behaviour to call this method for a job that is not in the `Failed` state.

Also applies to: 63-63

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still need to remove this, right?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
src/spider/client/Exception.hpp (2)

14-15: Consider using std::string_view for better performance.

The constructor could be more efficient by accepting a std::string_view instead of const std::string&, especially since the parameter is only used for formatting.

-    explicit ConnectionException(std::string const& addr)
+    explicit ConnectionException(std::string_view addr)

26-28: Improve message formatting consistency.

The message formatting spans multiple lines with inconsistent indentation. Consider reformatting for better readability:

-            : m_message(
-                      fmt::format("Driver ID {} is currently in use.", boost::uuids::to_string(id))
-              ) {}
+            : m_message(fmt::format("Driver ID {} is currently in use.",
+                                   boost::uuids::to_string(id))) {}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 61be939 and 5bffeee.

📒 Files selected for processing (6)
  • src/spider/client/Data.hpp (1 hunks)
  • src/spider/client/Driver.hpp (1 hunks)
  • src/spider/client/Exception.hpp (1 hunks)
  • src/spider/client/Job.hpp (1 hunks)
  • src/spider/client/TaskContext.hpp (1 hunks)
  • src/spider/client/TaskGraph.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/spider/client/Data.hpp
  • src/spider/client/Driver.hpp
  • src/spider/client/Job.hpp
  • src/spider/client/TaskContext.hpp
  • src/spider/client/TaskGraph.hpp
🔇 Additional comments (2)
src/spider/client/Exception.hpp (2)

1-10: LGTM! Header guards and includes are well-organized.

The includes are properly organized with standard library headers first, followed by third-party libraries. All necessary dependencies are included.


11-35: Well-designed exception hierarchy!

The implementation follows good C++ practices with:

  • Proper inheritance from std::exception
  • Consistent design patterns across exception classes
  • Thread-safe message handling
  • Clear and informative error messages

@kirkrodrigues kirkrodrigues changed the title feat: Add client interface feat: Add client interface. Nov 30, 2024
@sitaowang1998 sitaowang1998 merged commit 326e2f9 into y-scope:main Nov 30, 2024
4 checks passed
@sitaowang1998 sitaowang1998 deleted the interface branch November 30, 2024 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants